Code
pacman::p_load(sf, sfdep, tmap, tidyverse, plotly, zoo, Kendall, kableExtra, ggrain, DT)As urban infrastructures become increasingly digitized, data from sources like buses, taxis, and public utilities offer valuable insights into movement patterns over time and space. The widespread use of technologies like GPS and RFID in vehicles has generated massive datasets, including route and ridership data. Analyzing these datasets can reveal important patterns and characteristics of human movement in a city, benefiting urban management and transport service providers.
This study aims to leverage Exploratory Spatial Data Analysis (ESDA), specifically Local Indicators of Spatial Association (LISA) and Emerging Hot Spot Analysis (EHSA), to uncover spatial and spatio-temporal mobility patterns among public bus passengers in Singapore.
The R packages used for the analysis are as follows:
sf: Analyzes and models spatial dependencies in data.
tmap: Creates thematic maps for visualizing geospatial data.
tidyverse: A collection of R packages with a unified approach for data manipulation and visualization.
plotly: Enables interactive and dynamic data visualizations.
zoo: Handles and analyzes time series data.
Kendall: Computes Kendall’s tau rank correlation coefficient for assessing rank-based associations.
kableExtra: Enhances ‘knitr’ package’s ‘kable()’ function for styling HTML and LaTeX tables in R Markdown. It offers advanced formatting options like row/column customization, conditional styling, and captioning, elevating tables to publication quality.
ggrain: R-package that allows you to create Raincloud plots - following the ‘Grammar of Graphics’ (i.e., ggplot2)
DT: Create interactive html tables
# Load each csv file into R separately
bus08 <- read_csv("data/aspatial/origin_destination_bus_202308.csv")
bus09 <- read_csv("data/aspatial/origin_destination_bus_202309.csv")
bus10 <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
# Combine all rows into single dataframe
origind <- rbind(bus08, bus09, bus10)
str(origind)spc_tbl_ [17,118,005 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ YEAR_MONTH : chr [1:17118005] "2023-08" "2023-08" "2023-08" "2023-08" ...
$ DAY_TYPE : chr [1:17118005] "WEEKDAY" "WEEKENDS/HOLIDAY" "WEEKENDS/HOLIDAY" "WEEKDAY" ...
$ TIME_PER_HOUR : num [1:17118005] 16 16 14 14 17 17 17 17 7 17 ...
$ PT_TYPE : chr [1:17118005] "BUS" "BUS" "BUS" "BUS" ...
$ ORIGIN_PT_CODE : chr [1:17118005] "04168" "04168" "80119" "80119" ...
$ DESTINATION_PT_CODE: chr [1:17118005] "10051" "10051" "90079" "90079" ...
$ TOTAL_TRIPS : num [1:17118005] 7 2 3 10 5 4 3 22 3 3 ...
- attr(*, "spec")=
.. cols(
.. YEAR_MONTH = col_character(),
.. DAY_TYPE = col_character(),
.. TIME_PER_HOUR = col_double(),
.. PT_TYPE = col_character(),
.. ORIGIN_PT_CODE = col_character(),
.. DESTINATION_PT_CODE = col_character(),
.. TOTAL_TRIPS = col_double()
.. )
- attr(*, "problems")=<externalptr>
| YEAR_MONTH | DAY_TYPE | TIME_PER_HOUR | PT_TYPE | ORIGIN_PT_CODE | DESTINATION_PT_CODE | TOTAL_TRIPS |
|---|---|---|---|---|---|---|
| 2023-08 | WEEKDAY | 16 | BUS | 04168 | 10051 | 7 |
| 2023-08 | WEEKENDS/HOLIDAY | 16 | BUS | 04168 | 10051 | 2 |
| 2023-08 | WEEKENDS/HOLIDAY | 14 | BUS | 80119 | 90079 | 3 |
| 2023-08 | WEEKDAY | 14 | BUS | 80119 | 90079 | 10 |
| 2023-08 | WEEKENDS/HOLIDAY | 17 | BUS | 44069 | 17229 | 5 |
| 2023-08 | WEEKDAY | 17 | BUS | 44069 | 17229 | 4 |
| 2023-08 | WEEKENDS/HOLIDAY | 17 | BUS | 20281 | 20141 | 3 |
| 2023-08 | WEEKDAY | 17 | BUS | 20281 | 20141 | 22 |
| 2023-08 | WEEKDAY | 7 | BUS | 19051 | 10017 | 3 |
| 2023-08 | WEEKENDS/HOLIDAY | 17 | BUS | 11169 | 04219 | 3 |
YEAR_MONTH: Data collection month in the format of YYYY-MM.
DAY_TYPE: Weekday or Weekends/Holiday.
TIME_PER_HOUR: Hour of the day in 24 hour format.
PT_TYPE: Type of public transportation.
ORIGIN_PT_CODE: Identifier for the bus stop where the trip originated.
DESTINATION_PT_CODE: Identifier for the bus stop where the trip ended.
TOTAL_TRIPS: Total number of trips recorded for each origin-destination pair.
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE
Length:17118005 Length:17118005 Min. : 0.00 Length:17118005
Class :character Class :character 1st Qu.:10.00 Class :character
Mode :character Mode :character Median :14.00 Mode :character
Mean :14.06
3rd Qu.:18.00
Max. :23.00
ORIGIN_PT_CODE DESTINATION_PT_CODE TOTAL_TRIPS
Length:17118005 Length:17118005 Min. : 1.00
Class :character Class :character 1st Qu.: 2.00
Mode :character Mode :character Median : 4.00
Mean : 20.46
3rd Qu.: 12.00
Max. :36668.00
[1] "Unique origins: 5075"
[1] "Unique destinations: 5079"
[1] 7977626
Most of the data types have a Class and Mode of “character”
Over a three-month period, a total of 17,118,005 combinations of bus trips were recorded.
There are 5075 unique origin bus stops, and 5079 unique destination bus stops
On average, there are 20.46 trips for each bus route, but the highest recorded number of trips for a single route is an exceptional 36,668. This significant discrepancy suggests there might be outliers or anomalies in the data, warranting additional investigation.
In a dataset of 17 million trips, the low median of 4, high mean of 20.5, and a maximum of 38,000, along with over 7.9 million trips exceeding the median, indicate a right-skewed distribution. This suggests a concentration of lower-value trips with numerous higher-value outliers.
Reading layer `BusStop' from data source
`C:\Zackkoh94\ISSS624\Take-home_Ex1\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Rows: 5,161
Columns: 4
$ BUS_STOP_N <fct> 22069, 32071, 44331, 96081, 11561, 66191, 23389, 54411, 285…
$ BUS_ROOF_N <fct> B06, B23, B01, B05, B05, B03, B02A, B02, B09, B01, B16, B02…
$ LOC_DESC <fct> OPP CEVA LOGISTICS, AFT TRACK 13, BLK 239, GRACE INDEPENDEN…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
The EPSG.io database indicates that the coordinate system for Singapore is SVY21, associated with the EPSG code 3414. However, the ‘busstop’ dataset is currently projected using SVY21 with an EPSG code of 9001, highlighting a need to change to correct it to the EPSG code of 3414.
# Setting the CRS for the busstop data to EPSG 3414
busstop <- st_set_crs(busstop, 3414) %>%
# Changing the column name for easier integration with main dataframe
mutate(
ORIGIN_PT_CODE = as.factor(BUS_STOP_N)
) %>%
# Keeping only necessary columns for further analysis
select(
ORIGIN_PT_CODE,
LOC_DESC,
geometry
)
# Verifying the CRS assignment for busstop
st_crs(busstop)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The task specified in this particular project requires the computation of passenger trips at the following specific timeframes.
| Peak hour period | Bus tap on time |
|---|---|
| Weekday morning peak | 6am to 9am |
| Weekday afternoon peak | 5pm to 8pm |
| Weekend/holiday morning peak | 11am to 2pm |
| Weekend/holiday evening peak | 4pm to 7pm |
As mentioned before, ORIGIN_PT_CODE and DESTINATION_PT_CODE’s data type are in character format. Because they represent busstop locations, they should be transformed into factors (categorical data type) for further analysis
origind_agg <- origind %>%
# Classify trips into specific time periods based on day type and hour
mutate(period = ifelse(DAY_TYPE == "WEEKDAY" &
TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9,
"Weekday morning peak",
ifelse(DAY_TYPE == "WEEKDAY" &
TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20,
"Weekday evening peak",
ifelse(DAY_TYPE == "WEEKENDS/HOLIDAY" &
TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14,
"Weekend/PH morning peak",
ifelse(DAY_TYPE == "WEEKENDS/HOLIDAY" &
TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19,
"Weekend/PH evening peak",
"Others"))))
) %>%
# Exclude data outside the specified periods for focused analysis
filter(
period != "Others"
) %>%
# Aggregate the total number of trips by origin bus stop and month for each classified period
group_by(
YEAR_MONTH,
period,
ORIGIN_PT_CODE
) %>%
summarise(
num_trips = sum(TOTAL_TRIPS)
) %>%
ungroup()We categorize and preprocess the data into the time frames as shown above
[[1]]
# A tibble: 60,168 × 4
YEAR_MONTH period ORIGIN_PT_CODE num_trips
<chr> <chr> <fct> <dbl>
1 2023-08 Weekday evening peak 01012 8448
2 2023-08 Weekday evening peak 01013 7328
3 2023-08 Weekday evening peak 01019 3608
4 2023-08 Weekday evening peak 01029 9317
5 2023-08 Weekday evening peak 01039 12937
6 2023-08 Weekday evening peak 01059 2133
7 2023-08 Weekday evening peak 01109 322
8 2023-08 Weekday evening peak 01112 45010
9 2023-08 Weekday evening peak 01113 27233
10 2023-08 Weekday evening peak 01119 9323
# ℹ 60,158 more rows
# Count of dupes
count_duplicate_rows <- function(df, df_name) {
df %>%
group_by_all() %>%
filter(n() > 1) %>%
ungroup() %>%
summarise(df_name = df_name, row_count = n())
}
duplicate1 <- count_duplicate_rows(bus08, "bus08")
duplicate2 <- count_duplicate_rows(bus09, "bus09")
duplicate3 <- count_duplicate_rows(bus10, "bus10")
all_duplicates <- bind_rows(duplicate1, duplicate2, duplicate3)
# Output the combined counts
print(all_duplicates)# A tibble: 3 × 2
df_name row_count
<chr> <int>
1 bus08 0
2 bus09 0
3 bus10 0
There no duplicates which could adversely impact any join functions in the latter part of the project
# Function to count the number of rows containing null values
count_null_rows <- function(df, df_name) {
df %>%
filter(if_any(everything(), is.na)) %>%
summarise(df_name = df_name, row_count = n())
}
# Apply the function to each dataframe to count rows with nulls
nulls1 <- count_null_rows(bus08, "bus08")
nulls2 <- count_null_rows(bus09, "bus09")
nulls3 <- count_null_rows(bus10, "bus10")
# Combine the results
all_nulls <- bind_rows(nulls1, nulls2, nulls3)
# Print the counts of rows with nulls
print(all_nulls)# A tibble: 3 × 2
df_name row_count
<chr> <int>
1 bus08 0
2 bus09 0
3 bus10 0
There no nulls which could adversely impact any join functions in the latter part of the project
Understanding daily passenger trip distribution is crucial for urban traffic management and congestion relief, with geospatial analysis providing insights into areas of dense commuter traffic and patterns.
As a start, we will look at the distribution of passenger trips for different periods in September 2023
# Generate scatter plots for September 2023 bus passenger data
bus09_scatter <- origind_agg %>%
filter(YEAR_MONTH == "2023-09") %>% # Focus on September 2023
ggplot(aes(x = num_trips, y = period, fill = period, color = period)) + # Set plot aesthetics
geom_point(size = 3, alpha = 0.7) + # Create scatter plot
scale_x_continuous( # Configure x-axis scale
labels = scales::number_format(accuracy = 1),
breaks = scales::pretty_breaks(n = 5)) +
scale_fill_brewer(palette = "Set2") + # Use a Brewer color palette
labs(
title = "September 2023: Peak Passenger Volumes in Weekday Mornings",
subtitle = "Variability during weekdays suggests potential congestion") +
theme(
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_blank() # Remove x-axis labels for clarity
) # Adjust plot theme for a clean appearance
# Display the plot
bus09_scatter
Passenger counts are notably higher during peak weekday hours as opposed to weekend or public holiday peaks.
Each peak hour time frame exhibits a distribution skewed toward higher values (right), indicating sporadic instances of unusually high passenger numbers.
Such patterns may point to congestion at the Central Business District (CBD), bus stops near the highways, where passengers frequent to change buses, or Bus interchanges.
Subsequent investigation could ascertain if these occurrences are geographically clustered, helping to identify probable congestion focal points.
Next we explore if the distribution changes across different months (i.e. August to October 2023)
# Generate scatter plots for bus passenger data
origind_scatter <- origind_agg %>%
ggplot(aes(x = period, y = num_trips, fill = period, color = period)) +
geom_violin(position = position_nudge(x = 0, y = .2), alpha = 0.5) + # Create violin plots
geom_point(aes(y = num_trips, color = period),
position = position_jitter(height = .15), size = .5, alpha = 0.8) + # Add jittered points
facet_wrap(~YEAR_MONTH, nrow = 1) + # Facet by YEAR_MONTH for comparison
scale_y_continuous(labels = scales::number_format(accuracy = 1),
breaks = scales::pretty_breaks(n = 3)) + # Adjust y-axis scale
labs(title = "Passenger Volume Distribution: Aug - Oct 2023") + # Add a descriptive title
theme(
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank() # Remove x-axis labels for clarity
)
# Display the plot
origind_scatter
The scatter plots indicate a consistent trip distribution across three months, suggesting the presence of bus stops with high passenger volumes, particularly during weekday evenings. This consistency hints at potential congestion hot spots, particularly if these stops are clustered. Consequently, focused Geospatial analysis on September 2023 data will be used to identify these areas.
extract September 2023 data
# Isolate September 2023 data for focused analysis
origind09 <- origind_agg %>%
filter(YEAR_MONTH == "2023-09") %>% # Targeting September 2023
pivot_wider( # Transform data to wide format for easier analysis
names_from = period,
values_from = num_trips
) %>%
select(-YEAR_MONTH) # Exclude YEAR_MONTH column from the final output
# Create a datatable visualization for the transformed data
DT::datatable(origind09,
options = list(pageLength = 5, autoWidth = TRUE),
rownames = FALSE) # Configure table display optionsSimple feature collection with 5161 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 10 features:
ORIGIN_PT_CODE LOC_DESC Weekday evening peak
1 22069 OPP CEVA LOGISTICS 44
2 32071 AFT TRACK 13 NA
3 44331 BLK 239 1524
4 96081 GRACE INDEPENDENT CH 274
5 11561 BLK 166 134
6 66191 AFT CORFE PL 325
7 23389 PEC LTD 388
8 54411 BLK 527 1204
9 28531 BLK 536 3529
10 96139 BLK 148 1152
Weekday morning peak Weekend/PH evening peak Weekend/PH morning peak
1 13 2 20
2 NA 1 NA
3 1701 630 707
4 286 99 114
5 152 79 77
6 386 193 234
7 49 81 18
8 2764 649 1612
9 8000 2222 2373
10 4406 735 997
geometry
1 POINT (13576.31 32883.65)
2 POINT (13228.59 44206.38)
3 POINT (21045.1 40242.08)
4 POINT (41603.76 35413.11)
5 POINT (24568.74 30391.85)
6 POINT (30951.58 38079.61)
7 POINT (12476.9 32211.6)
8 POINT (30329.45 39373.92)
9 POINT (14993.31 36905.61)
10 POINT (41642.81 36513.9)
origind09_sf , being a dataframe containing spatial points, isn’t ideal for spatial autocorrelation studies that necessitate ‘areas’ to be depicted as polygons. The code that follows will reorganize these bus stop points into a hexagonal lattice for better spatial analysis.
The code above does the following:
The cellsize parameter is measured in the same units as the origind09_sf dataframe’s coordinate system. Since origind09_sf uses EPSG:3414 projection, which has units in meters, this cellsize defines the hexagon’s width, here chosen to be 500 meters.
Each hexagon is given a unique hex_id as the primary key.
The hexagonal grid is designed to contain multiple bus stops, with each hexagon’s hex_id acting as the key for compiling data. The following steps detail the process for structuring attributes based on hex_id:
Apply st_join() with the st_within join condition to match bus stop points to their respective hexagons.
Use st_set_geometry(NULL) to remove the spatial aspect, shifting the focus to attribute consolidation.
Implement group_by() to consolidate data under a distinct hex_id.
Deploy summarise() to tally the number of bus stops and to sum up trips for each peak period by hexagon.
Convert any NA entries to 0 with replace(is.na(.), 0) to clean the dataset. #This is just a precaution although we already did the checks prior
# Joining bus stops with hexagonal areas and summarizing data
origind09_stops <- origind09_sf %>%
st_join(origind09_hx, join = st_within) %>% # Associate bus stops with hexagons
group_by(hex_id) %>% # Group by hexagon ID
summarise(
total_busstops = n(), # Count bus stops per hexagon
busstop_ids = str_c(ORIGIN_PT_CODE, collapse = ", "), # Combine bus stop codes
loc_descriptions = str_c(LOC_DESC, collapse = "; "), # Combine location descriptions
peak_weekday_morning = sum(`Weekday morning peak`), # Sum for weekday morning peak
peak_weekday_evening = sum(`Weekday evening peak`), # Sum for weekday evening peak
peak_weekend_morning = sum(`Weekend/PH morning peak`), # Sum for weekend morning peak
peak_weekend_evening = sum(`Weekend/PH evening peak`) # Sum for weekend evening peak
) %>%
st_set_geometry(NULL) %>% # Remove spatial component
replace_na(list(peak_weekday_morning = 0, peak_weekday_evening = 0, # Replace NA with 0
peak_weekend_morning = 0, peak_weekend_evening = 0)) %>%
ungroup() # Remove grouping
# Display the processed data
origind09_stops# A tibble: 1,524 × 8
hex_id total_busstops busstop_ids loc_descriptions peak_weekday_morning
<int> <int> <chr> <chr> <dbl>
1 34 1 25059 AFT TUAS STH BLVD 91
2 65 1 25751 BEF TUAS STH AVE 14 41
3 99 1 26379 YONG NAM 50
4 127 1 25761 OPP REC S'PORE 129
5 129 2 25719, 26389 THE INDEX; BEF TUAS … 1104
6 130 1 26369 SEE HUP SENG 34
7 131 1 26299 BEF TUAS STH AVE 6 41
8 159 1 25741 HALLIBURTON 66
9 160 1 25711 OPP THE INDEX 204
10 161 2 25701, 25709 GLAXOSMITHKLINE; EDG… 842
# ℹ 1,514 more rows
# ℹ 3 more variables: peak_weekday_evening <dbl>, peak_weekend_morning <dbl>,
# peak_weekend_evening <dbl>
This sections aims to conduct visualization and sense making of the current data before embarking on more complex methodologies such as Local Indicators of Spatial Association (LISA)
# Switch to tmap interactive viewing mode
tmap_mode("view")
# Creating an interactive map of bus traffic
bus_traffic_09_map <- tm_basemap("CartoDB.Positron") +
tm_shape(bus_traffic_09) + # Connecting tm_shape directly to tm_fill
tm_fill(
col = "total_busstops", # Color based on the number of bus stops
palette = "YlGnBu", # Using the YlGnBu color palette
style = "cont", # Continuous color style
id = "loc_descriptions", # Identify hexagons by hex_id
popup.vars = c("Bus Stops Count" = "total_busstops",
"Stop Codes" = "busstop_ids"), # Popup information
title = "Number of Bus Stops" # Title for the color legend
) +
tm_layout(
legend.show = FALSE)
# Display the interactive map
bus_traffic_09_mapThe map indicates that the central area, which likely includes the Business Districts, the likes of the Central Business districts or One North Business parks, as well as highly populated residential zones areas like Seng Kang, have a higher concentration of bus stops. These areas are known for their high human traffic which necessitates the need for a more conducive and extensive public transportation network to facilitate the daily travel needs of its patrons.
In contrast, the more unvisited areas of the island, such as Lim Chu Kang, an area marked for more agricultural activities, are marked by lighter shades, highlighting a sparser presence of bus stops.
These regions, which are less urbanized or more industrialized typically see lower human traffic.
The distribution of bus stops in Singapore as seen in the map emulates Singapore’s Land Transport Authority (LTA)’s strategic approach to provide an efficient bus serivice, focusing on providing efficient access and use of resources, as well as connectivity, especially in areas with high human traffic where public transport demand is higher.
ORIGIN_PT_CODE LOC_DESC Weekday evening peak
11009 : 2 BLK 1 : 6 Min. : 1
22501 : 2 BLK 25 : 6 1st Qu.: 639
43709 : 2 AFT YIO CHU KANG RD: 5 Median : 1797
47201 : 2 BLK 101 : 5 Mean : 4464
51071 : 2 BLK 109 : 5 3rd Qu.: 4080
52059 : 2 (Other) :5126 Max. :481495
(Other):5149 NA's : 8 NA's :170
Weekday morning peak Weekend/PH evening peak Weekend/PH morning peak
Min. : 1 Min. : 1.0 Min. : 1
1st Qu.: 414 1st Qu.: 228.5 1st Qu.: 207
Median : 1946 Median : 715.0 Median : 742
Mean : 4560 Mean : 1707.7 Mean : 1689
3rd Qu.: 5430 3rd Qu.: 1666.0 3rd Qu.: 1889
Max. :328545 Max. :158693.0 Max. :112330
NA's :176 NA's :202 NA's :180
geometry
POINT :5161
epsg:3414 : 0
+proj=tmer...: 0
Reinforcing the statements above, the summary has concluded that the data is indeed right skewed, as such the breaks for the following maps will be adjusted to better illustrate the comparisons between peak hour time frames
peak_weekday_morning_map <- tm_basemap("CartoDB.Positron") +
tm_shape(bus_traffic_09) +
tm_fill(
col = "peak_weekday_morning",
palette = "YlGnBu",
title = "Peak Weekday Morning Traffic",
id = "loc_descriptions",
showNA = FALSE,
alpha = 0.6,
breaks = c(0, 500, 1000, 2000, 3000, 5000, 10000, 50000, 100000, 300000, 550000)
) +
tm_borders() +
tm_layout(
legend.show = TRUE,
legend.position = c("right", "top"),
legend.width = 0.2,
legend.height = 0.5
)
peak_weekday_morning_mappeak_weekday_evening_map <- tm_basemap("CartoDB.Positron") +
tm_shape(bus_traffic_09) +
tm_fill(
col = "peak_weekday_evening",
palette = "YlOrRd",
title = "Peak Weekday Evening Traffic",
id = "loc_descriptions",
showNA = FALSE,
alpha = 0.6,
breaks = c(0, 500, 1000, 2000, 3000, 5000, 10000, 50000, 100000, 300000, 550000)
) +
tm_borders() +
tm_layout(
legend.show = TRUE,
legend.position = c("right", "top"),
legend.width = 0.2,
legend.height = 0.5
)
peak_weekday_evening_mappeak_weekend_evening_map <- tm_basemap("CartoDB.Positron") +
tm_shape(bus_traffic_09) + # Added '+' here
tm_fill(
col = "peak_weekend_evening",
palette = "RdPu",
title = "Peak Weekend Evening Traffic",
id = "loc_descriptions",
showNA = FALSE,
alpha = 0.6,
breaks = c(0, 500, 1000, 2000, 3000, 5000, 10000, 50000, 100000, 300000, 550000)
) +
tm_borders() +
tm_layout(
legend.show = TRUE,
legend.position = c("right", "top"),
legend.width = 0.2,
legend.height = 0.5
)
peak_weekend_evening_map# Creating a static map of peak weekend evening bus traffic
peak_weekend_evening_map <- tm_basemap("CartoDB.Positron") +
tm_shape(bus_traffic_09) +
tm_fill(
col = "peak_weekend_evening",
palette = "Blues",
title = "Peak Weekend Evening Traffic",
id = "loc_descriptions",
showNA = FALSE,
alpha = 0.6,
breaks = c(0, 500, 1000, 2000, 3000, 5000, 10000, 50000, 100000, 300000, 550000)
) +
tm_borders() +
tm_layout(legend.show = TRUE)
# Display the map
peak_weekend_evening_mapUnsurprisingly, the weekdays’ trip volumes surpass that of the weekends/holidays, most plausibly due to the working crowd. Nonetheless, the areas regardless of the peak hour time frames, the areas which see higher human traffic remains unchanged, highlighting the efficacy of the Singapore bus network in serving the needs of the commuters.
library(ggplot2)
library(gridExtra)
# Function to create a scatter plot
create_scatter_plot <- function(data, y_column_name, title, color = "blue") {
ggplot(data, aes_string(x = "total_busstops", y = y_column_name)) +
geom_point(alpha = 0.7, color = color) +
ylim(0, 500000) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 6)) +
labs(title = title, x = "Number of Bus Stops", y = "") +
theme(panel.grid = element_blank()) # Removed axis.text.y = element_blank()
}
# Creating individual scatter plots
morning_peak_weekday <- create_scatter_plot(bus_traffic_09, "peak_weekday_morning", "Weekday Morning Peak", "blue")
evening_peak_weekday <- create_scatter_plot(bus_traffic_09, "peak_weekday_evening", "Weekday Evening Peak", "blue")
morning_peak_weekend <- create_scatter_plot(bus_traffic_09, "peak_weekend_morning", "Weekend/PH Morning Peak", "green")
evening_peak_weekend <- create_scatter_plot(bus_traffic_09, "peak_weekend_evening", "Weekend/PH Evening Peak", "green")
# Combining the plots using grid.arrange
combined_plot <- grid.arrange(morning_peak_weekday, evening_peak_weekday, morning_peak_weekend, evening_peak_weekend, ncol = 2, nrow = 2)
The volume of passenger trips appears to be greatest in regions having 4 to 8 bus stops. Contrarily, areas with the most bus stops, ranging from 9 to 11, did not experience the highest levels of passenger traffic during any peak times.
This indicates the possibility of an ideal number of bus stops within each 500m-wide area that could help in alleviating congestion, measures taken could be to look into replanning bus stops placements so as to not have diminishing returns.
In preparing for spatial autocorrelation analysis, we first establish a spatial weights matrix to map the interconnections between hexagonal spatial units by their distance. Key to this setup is ensuring that each unit has at least one, but not all, other units as neighbors to reflect diverse spatial relationships accurately.
Given our dataset’s skewed distribution, we’ve chosen to assign 10 neighbors per feature, exceeding the suggested minimum of eight. This approach enhances the robustness of our spatial connectivity analysis.
Our study area, marked by unevenly distributed bus stops and zones with low residential or business density, leads us to prefer distance-based methods over contiguity-based ones. This choice aligns better with our data’s spatial characteristics.
We adopt the Adaptive Distance-Based method, with a fixed number of neighbors, to accommodate our dataset’s right-skewed distribution. This method ensures each hexagon connects with exactly 10 neighbors. We employ the st_knn function to identify neighbors and st_weights to create row-standardized spatial weights, setting a solid foundation for our spatial autocorrelation analysis.
# Adjusting bus_traffic_09 data with spatial weights
weightmat_all <- bus_traffic_09 %>%
mutate(
knn = st_knn(geometry, k = 10), # Calculating 10 nearest neighbors
weight_type = st_weights(knn, style = 'W'), # Generating weights
.before = 1 # Placing the new columns at the beginning
)
# check the output
kable(head(weightmat_all))| knn | weight_type | hex_id | total_busstops | busstop_ids | loc_descriptions | peak_weekday_morning | peak_weekday_evening | peak_weekend_morning | peak_weekend_evening | geometry |
|---|---|---|---|---|---|---|---|---|---|---|
| 2, 4, 5, 8, 9, 12, 16, 22, 23, 38 | 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 | 34 | 1 | 25059 | AFT TUAS STH BLVD | 91 | 348 | 0 | 96 | POLYGON ((3970.122 27925.48... |
| 1, 4, 5, 8, 9, 12, 16, 22, 23, 38 | 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 | 65 | 1 | 25751 | BEF TUAS STH AVE 14 | 41 | 131 | 38 | 40 | POLYGON ((4220.122 28358.49... |
| 5, 6, 9, 10, 13, 14, 16, 17, 24, 25 | 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 | 99 | 1 | 26379 | YONG NAM | 50 | 278 | 78 | 84 | POLYGON ((4470.122 30523.55... |
| 1, 2, 8, 9, 12, 16, 22, 23, 38, 39 | 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 | 127 | 1 | 25761 | OPP REC S'PORE | 129 | 1689 | 246 | 454 | POLYGON ((4720.122 28358.49... |
| 3, 6, 9, 10, 12, 13, 14, 16, 17, 24 | 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 | 129 | 2 | 25719, 26389 | THE INDEX; BEF TUAS STH AVE 5 | 1104 | 2608 | 503 | 646 | POLYGON ((4720.122 30090.54... |
| 3, 5, 7, 9, 10, 13, 14, 17, 18, 25 | 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 | 130 | 1 | 26369 | SEE HUP SENG | 34 | 185 | 117 | 27 | POLYGON ((4720.122 30956.57... |
# Ensure consistent results
set.seed(5555)
# Assuming 'bus_traffic_09' is a spatial dataset with a 'geometry' column
# List of columns for peak traffic analysis
bus_traffic_columns <- c("peak_weekday_morning", "peak_weekday_evening", "peak_weekend_morning", "peak_weekend_evening")
# Function to calculate global Moran's I
calculate_moran_i <- function(dataset, column_name, neighbors) {
# Validate that the dataset is properly structured as a spatial data frame
if (!("geometry" %in% names(dataset))) {
stop("The dataset does not have a geometry column.")
}
# Validate that the column_name exists in the dataset
if (!(column_name %in% names(dataset))) {
stop(paste("Column", column_name, "not found in the dataset."))
}
# Establishing spatial relationships
neighbors_list <- st_knn(dataset$geometry, k = neighbors)
weights <- st_weights(neighbors_list, style = 'W')
# Executing Moran's I calculation
moran_test_result <- global_moran_perm(dataset[[column_name]], neighbors_list, weights, nsim = 99)
# Organizing the output
output <- list(
column = column_name,
moran_i = moran_test_result
)
return(output)
}
# Running the Moran's I calculation for each traffic time slot
moran_i_results <- lapply(bus_traffic_columns, function(column) calculate_moran_i(bus_traffic_09, column, 10))
# Displaying the calculated results
print(moran_i_results)[[1]]
[[1]]$column
[1] "peak_weekday_morning"
[[1]]$moran_i
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.21341, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
[[2]]
[[2]]$column
[1] "peak_weekday_evening"
[[2]]$moran_i
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.061867, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
[[3]]
[[3]]$column
[1] "peak_weekend_morning"
[[3]]$moran_i
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.15841, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
[[4]]
[[4]]$column
[1] "peak_weekend_evening"
[[4]]$moran_i
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.11109, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
For each of the four time slots, the p-values associated with the Global Moran’s I test are less than 0.05, indicating a rejection of the null hypothesis, which suggests randomness in spatial patterns.
Additionally, the positive values of the Moran’s I statistics imply a tendency towards clustering in the spatial distribution across all time intervals.
Investigating spatial patterns at a detailed level, we utilize the Local Moran’s I analysis to identify specific areas with strong or weak spatial associations. The methodology categorizes regions based on clustering types like high-high or low-low, revealing areas of similar or contrasting characteristics. We apply the local_moran function from the sfdep package for examining passenger trips at a hexagonal level, which calculates neighbors and weights automatically and uses simulated data for precision.
# Create a function to perform local Moran's I analysis
get_lisa <- function(weightmat_all, bus_traffic_column, k) {
# Check if the bus_traffic_column is valid
if (nchar(bus_traffic_column) == 0) {
stop("The column name provided is empty.")
}
# Compute spatial weights
nb <- st_knn(weightmat_all$geometry, k = k)
wt <- st_weights(nb, style = 'W')
# Perform local Moran's I analysis and create a new data frame
result <- weightmat_all %>%
mutate(
local_moran = local_moran(.data[[bus_traffic_column]], nb, wt, nsim = 99),
.before = 1
) %>%
unnest(local_moran)
return(result)
}
# Assuming bus_traffic_columns is a vector of column names
bus_traffic_columns <- c("peak_weekday_morning", "peak_weekday_evening", "peak_weekend_morning", "peak_weekend_evening")
# Initialize a list to store results for each bus_traffic_column
lisa_results <- list()
# Apply the function for each bus traffic time interval and store results in the list
for (btc in bus_traffic_columns) {
lisa_results[[btc]] <- get_lisa(weightmat_all, btc, k = 10)
# Remove columns that don't belong to the specific time interval
unwanted_columns <- setdiff(bus_traffic_columns, btc)
lisa_results[[btc]] <- lisa_results[[btc]][, !(names(lisa_results[[btc]]) %in% unwanted_columns)]
}
# Show sample output in an interactive table
datatable(lisa_results[["peak_weekday_morning"]],
class = 'cell-border stripe',
options = list(pageLength = 5))We now utilize the tmap package to generate choropleth maps that display Local Moran’s I values and corresponding p-values for four time intervals. The maps will highlight only significant Local Moran’s I values at a 5% significance level. The existing code is specifically used to extract these key Local Moran’s I values for map creation.
get_sig_lmi_map <- function(lisa_table, title) {
sig_lisa_table <- lisa_table %>%
filter(p_ii_sim < 0.05)
result <- tm_shape(lisa_table) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(sig_lisa_table) +
tm_fill("ii", palette = "Reds") +
tm_borders(alpha = 0.4) +
tm_layout(
main.title = title,
main.title.size = 1.3
)
return(result)
}
# Applying the function to create maps for different peak intervals
sig_lmi_1 <- get_sig_lmi_map(lisa_results[["peak_weekday_morning"]], "Weekday Morning" )
sig_lmi_2 <- get_sig_lmi_map(lisa_results[["peak_weekday_evening"]], "Weekday Afternoon" )
sig_lmi_3 <- get_sig_lmi_map(lisa_results[["peak_weekend_morning"]], "Weekend Morning" )
sig_lmi_4 <- get_sig_lmi_map(lisa_results[["peak_weekend_evening"]], "Weekend Afternoon" )
tmap_mode('view')
tmap_arrange(
sig_lmi_1,
sig_lmi_2,
sig_lmi_3,
sig_lmi_4,
asp = 2,
nrow = 2,
ncol = 2
)